Postprocessing for Character Recognition Using Keyword Information
نویسندگان
چکیده
We propose a new error correction method of postprocessing For characler recognition which has good performance, even If rhe recognition accuracy is low. In conventional pstproressing, knowledge of grammar and vocabulary is used. However. in the case of low recognition accuracy. that postprocessings can not F r form good correction because there are many candidales for the character string. Our method not only makes use of the knowledge of grammar and vocabulary. but also the knowledge of contenf in the document. In this method first we automatically extract keywords using Zipf s law. Then we cartecl characters using those extracted keywords. We experimented on postprocessing for character recognition using keyword information. We show that this extraction of keywords works very well, and we show that the recognition accuracy rises. When the recognilion accuracy before postprocessing i s greater than 90%, the restoration rate (the ratio of the number of corrected character to that of wrong recognized character) is h ig r than 70%.
منابع مشابه
An Approach for License Plate Recognition Based on Temporal Redundancy
Recognition of vehicle license plates is an important task that can be applied for a myriad of real scenarios. Most approaches in the literature first detect an on-track vehicle, locate the license plate, perform a segmentation of its characters and then recognize them using an Optical Character Recognition (OCR) approach. However, these approaches focus on performing these tasks using only a s...
متن کاملThe Postprocessing of Optical Character Recognition Based on Statistical Noisy Channel and Language Model
The techniques of image processing have been used in optical character recognition (OCR) for a long time. The recognition method evolved from early "pattern recognition" to "feature extraction" recently. The recognition rate is raised from 70% to 90%. But the character by character recognition technique has its limitation. Using language models to assist the OCR system in improving recognition ...
متن کاملPost-processing based on Utterance Verification in Online Keyword Recognition for Multimedia Content Retrieval
In this paper, we propose an utterance verification-based postprocessing in online keyword recognition for multimedia content retrieval. The proposed post-processing technique verifies whether a candidate keyword segment can be categorized as a keyword. For this work, we employ a confidence measure based on the recognition results. In keyword recognition experiments, our approach achieved bette...
متن کاملImproving OCR Performance in Biomedical Literature Retrieval through Preprocessing and Postprocessing
Today’s information retrieval (IR) techniques are mostly text-based. As a consequence, some types of information are beyond the reach of text-based IR systems, which fail in situations where textual information can not be easily accessed, e.g. textual information in biomedical images and figures. To tackle such situations, we propose to augment IR systems with the ability to perform optical cha...
متن کاملImage-based keyword recognition in oriental language document images
-An algorithm is presented for keyword recognition in Oriental language document images. The objective is to recognize keywords composed of more than one consecutive character in document images where there are no explicit visually defined word boundaries. The technique exploits the redundancy expressed by the difference between the number of possible character strings of a fixed length and the...
متن کامل